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ABSTRACT 


Low  image  contrast  limits  the  amount  of  information  conveyed  to  the  user. 
With  the  proliferation  of  digital  imagery  and  computer  interface  between  man- 
and-machine,  it  is  now  viable  to  consider  digitally  enhancing  the  image  before 
presenting  it  to  the  user,  thus  increasing  the  information  throughput.  This  thesis 
explores  the  effect  of  the  Contrast  Limited  Adaptive  Histogram  Equalization 
(CLAHE)  process  on  night  vision  and  thermal  images.  With  better  contrast,  target 
detection  and  discrimination  can  be  improved.  The  contrast  enhancement  by 
CLAHE  is  visually  significant  and  details  are  easier  to  detect  with  the  higher 
image  contrast.  Analyzing  the  image  frequency  response  reveals  increases  in  the 
higher  spatial  frequencies.  As  higher  frequencies  correspond  to  image  edges,  the 
power  increase  is  viewed  as  corresponding  to  edge  enhancement  and  hence,  an 
increase  in  visible  image  details.  This  edge  enhancement  is  perceived  as 
improvement  in  image  quality.  This  is  further  substantiated  by  a  subjective 
testing,  where  a  majority  of  human  subjects  agreed  that  CLAHE-enhanced 
images  are  more  informative  than  the  original  night  vision  images. 
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I.  INTRODUCTION 


A.  BACKGROUND 

The  element  of  surprise  has  long  been  touted  as  the  main  tactical 
advantage  that  would  turn  the  tide  of  a  battle.  Throughout  history,  commanders 
have  employed  the  darkness  of  night  to  gain  surprise  and  to  grasp  the  initiative 
from  the  hands  of  the  enemy.  Yet,  while  night  operations  have  progressed  from 
nocturnal  maneuvers  to  the  more  recent  firefights  in  Afghanistan  and  the  “24- 
hour  battlefield”,  difficulties  associated  with  night  operations  still  plague  all 
commanders,  particularly  the  ability  to  see  clearly  and  the  ability  to  differentiate 
friend-or-foe.  The  fact  remains  that  darkness  is  "a  double-edged  weapon",  and 
like  terrain,  "it  favors  the  one  who  best  uses  it  and  hinders  the  one  who  does 
not."  [Sasso,  1982]. 

Human  beings  are  visual  and  non-nocturnal  creatures  by  nature.  Not 
gifted  with  any  special  or  hyper-sensitive  sensory  organs,  they  rely  more  on  their 
ability  to  see  than  on  any  of  the  other  four  senses  (smell,  hear,  touch  and  taste) 
to  understand  and  manipulate  their  surroundings.  The  cone  and  rod 
photoreceptors  in  the  human  eye  are  responsible  for  generating  these  sought-for 
visionary  senses.  The  rods  are  more  numerous  and  more  sensitive  than  cones  in 
low  levels  of  illumination  (more  than  one  thousand  times).  They  basically 
contribute  our  limited  night  or  scotopic  vision.  However,  the  rods  are  not  sensitive 
to  color  like  the  cones,  i.e.  they  only  generate  monochrome  images.  Hence, 
objects  that  appear  brightly  colored  in  daylight,  when  seen  under  moonlight 
appear  as  colorless  forms,  because  only  the  rods  are  stimulated. 

In  the  absence  of  artificial  light  sources,  the  main  source  of  natural 
illumination  at  night  comes  from  the  moon  and  to  a  lesser  degree,  the  stars 
(estimated  at  one-tenth  of  a  quarter  moon).  The  amount  of  luminance  ranges 
from  0.1  lux  (full  moon)  to  0.0001  lux  (overcast  night)  [Sampson,  1996]. 
Depending  on  the  reflectivity  of  the  objects,  the  eventual  irradiance  on  the  human 
eye  may  not  be  high  enough  to  even  stimulate  the  rods. 
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However,  if  we  explore  beyond  the  visible  light  spectrum  (400  -700  nm), 
the  Infra-Red  (IR)  spectrum  offers  possibilities  for  exploitation  as  reflected  in 
Figures  1  and  2.  Both  the  night  luminance  and  the  foliage  reflectivity  are  higher  in 
the  Near  Infra-Red  (NIR)  band,  i.e.  there  is  more  light  energy  in  this  wavelength 
band. 


no  «D  yn  UD  .  loe  tt  iQiB  itx  VP  -O) 
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Figure  1:  Natural  night  sky  spectral  irradiance,  showing  a  higher 

irradiance  in  the  NIR  band  [From  Korry,  2003]. 


Figure  2;  Foliage  reflectivity:  foliage  is  a  better  reflector  in  the  IR 
band  [From  Korry,  2003]. 
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Hence,  if  we  were  able  to  “sense”  IR  or  near-IR  radiation  (which  the 
human  photoreceptors  are  unable  to  do  naturally),  our  night  vision  capability 
would  be  immediately  improved,  given  the  higher  luminance  available. 


B.  NIGHT  VISION 

There  are  two  basic  methods  to  improve  night  vision.  The  first  is  to 
increase  the  amount  of  visible  light  reaching  the  eye,  as  with  artificial  lighting 
such  as  a  flashlight  or  by  converting  the  “otherwise-invisible”  radiation  to  visible 
radiation.  The  second  is  through  light  amplification,  i.e.  by  increasing  the 
normally  imperceptible  radiation  energy  to  a  level  detectable  by  the  human  eyes. 
These  methods  to  achieve  night  imagery  are  employed  by  the  Image  Intensifier 
(II)  and  the  Thermal  Imager  (Tl). 

1.  Image  Intensifier 

As  the  name  implies.  Image  Intensifiers  (II)  are  designed  to  boost  very  low 
intensity  optical  images  to  the  point  where  they  become  perceivable  to  the 
human  eye.  They  also  act  as  wavelength  “down-converters”,  that  is  they  convert 
near-IR  radiation  into  visible  radiation.  II  devices  are  commonly  known  as  Night 
Vision  Device  (NVD)  or  Night  Vision  Goggles  (NVG),  depending  on  the  mode  of 
usage. 


Figure  3;  A  Night  Vision  Device  with  the  light  amplifying 
microchannel  plate  [From  Korry,  2003]. 
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A  typical  II  system  consists  of  three  main  components:  the  photocathode, 
the  micro-channel  plate  (MCP)  and  the  phosphor  screen,  as  shown  in  Figure  3. 
Reflected  light  from  the  scene  or  object  enters  the  device  and  is  focused  onto  the 
photocathode  by  an  optical  lens  system.  Photons  striking  the  photocathode 
surface  release  photo-electrons.  The  flux  of  photo-electrons  generated  is 
proportional  to  the  flux  of  incident  light  photons  and  the  responsivity  of  the 
photocathode.  In  the  first-generation  of  NVDs,  the  energy  of  the  photo-electron  is 
increased  by  acceleration  with  an  externally  applied  electric  field.  Second- 
generation  devices  make  use  of  the  MCP  to  achieve  energy  gain  through 
electron  multiplication.  The  actual  number  of  photo-electrons  is  multiplied  by 
accelerating  the  electrons  through  the  MCP  where  an  “avalanche”  of  secondary 
electrons  is  produced  as  a  result  of  collisions  between  the  electrons  and  the 
MCP  wall.  On  emerging  from  the  MCP,  the  electrons  strike  a  phosphor  screen 
which  emits  visible  light,  hence  creating  a  visible  image  to  the  human  eye.  The 
most  commonly-used  phosphor  is  KA(P20)  as  it  emits  a  greenish  light  at  560  nm, 
matching  the  peak  sensitivity  of  the  human  eye.  Furthermore,  the  P20  has  fast 
decay  time  and  high  conversion  efficiency,  which  is  ideal  for  night  vision  purpose 
[Ji,  2002]. 

The  newer  generation  (Gen  III)  of  NVDs  uses  a  Gallium  Arsenide  (GaAs) 
photocathode  which  is  sensitive  to  light  beyond  800  nm  and  where  the  night  sky 
illuminance  levels  are  also  higher  (Figure  1).  The  MCP  used  in  the  third- 
generation  NVDs  is  also  much  smaller  in  pitch,  thus  giving  better  spatial 
resolution.  As  a  result,  Gen  III  NVDs  can  deliver  a  three-fold  improvement  in 
visual  acuity  and  detection  distances  over  the  earlier  generations.  The  light 
amplification  achievable  could  be  30,000  times  or  more  [LCEO,  2003]. 


4 


2.  Thermal  Imager 

All  material  objects  with  temperatures  above  absolute  zero  Kelvins  radiate 
infrared  energy.  A  Thermal  Imager  (Tl)  detects  this  radiation  (including  reflected 
infrared  energy)  and  converts  this  energy  into  a  visible  presentation.  The 
commonest  class  of  Tl  systems  is  the  Forward-Looking  Infrared  system  (FLIR).  A 
system  operating  in  the  8-  to  14-|im  region  is  usually  referred  to  as  an  LWIR 
(long-wavelength  infrared)  FLIR,  and  one  operating  in  the  3-  to  5-|im  as  a  MWIR 
(medium-wavelength  infrared)  FLIR.  These  are  the  two  transmission  windows 
where  atmospheric  attenuation  of  infrared  radiation  is  minimal. 

Most  IR  detectors  operate  using  quantum  mechanical  interaction  between 
incident  photons  and  detector  material.  Photoconductive  detectors  absorb 
photons  to  elevate  electrons  from  the  valence  band  to  the  conduction  band  of  the 
material,  changing  the  conductivity  of  the  detector.  Photovoltaic  detectors  absorb 
photons  to  create  electron  hole  pairs  across  a  p-n  junction  which  produces  a 
small  current.  Such  devices  can  be  manufactured  as  part  of  an  array  that 
includes  a  capacitor  that  stores  a  charge  proportional  to  the  incident  radiation. 
The  charged  array  can  then  be  read  or  scanned  to  produce  the  corresponding 
image. 

As  the  Tl  senses  temperature  difference  or  contrast  (sensitivity  is 
frequently  defined  in  terms  of  Minimum  Resolvable  Temperature  Difference), 
detectors  with  small  band-gap  energies  must  be  cooled  to  minimize  thermally 
generated  carriers  and  inherent  detector  noise. 

The  bolometer  is  a  thermal  detector  that  absorbs  thermal  energy  over  all 
wavelengths  and  changes  its  resistance  accordingly.  The  change  in  resistance 
will  produce  a  change  in  electric  current  which  can  be  monitored.  The  radiation  to 
the  bolometer  is  usually  modulated  to  improve  sensitivity  and  uniformity  [Holst, 
2003]. 
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C.  CONTRAST  SENSITIVITY 

The  difference  in  radiation  intensity  levels  (both  emitted  and  reflected) 
from  a  scene  creates  the  information  contained  within  an  image.  An  object  of 
interest  can  be  identified  by  its  contrast  against  its  immediate  surroundings, 
which  defines  the  object’s  boundaries  and  edges.  Contrast  is  defined  as  the 
difference  in  luminance  or  radiation  intensity  levels  between  regions  or  pixels. 

The  larger  the  contrast,  the  easier  an  object  can  be  detected  from  the 
scene.  This  can  be  illustrated  by  Figure  4,  a  Contrast  Sensitivity  Function  (CSF) 
test  image  produced  by  Campbell  and  Robson  in  1968. 


Figure  4:  Contrast  Sensitivity  Function  test  chart  by  Campbell- 

Robson  [From  McCourt,  2003]. 

In  Figure  4  above,  spatial  frequency  increases  from  left  to  right  (the  bars 
become  thinner  and  thinner)  and  contrast  decreases  from  bottom  to  top 
(difference  in  gray  level  between  the  bars  and  background  decreases).  From  a 
fixed  viewing  distance,  note  the  contrast  values  where  the  bars  are  just  barely 
visible  over  the  range  of  spatial  frequencies.  Trace  these  out  to  form  an  inverted 
U-shaped  curve  and  this  will  represent  your  contrast  sensitivity  function.  The 
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region  below  the  U-shaped  curve  is  the  visible  stimuli  region,  where  objects  of 
such  combination  of  spatial  frequency  and  contrast  will  be  detectable  by  the  eye. 
The  CSF  of  a  typical  adult  human  is  shown  in  Figure  5  for  reference.  The 
influence  of  contrast  on  visible  stimulus  and  object  detection  is  evident. 


Spatial  Frequency  (cycles/degree) 

Figure  5:  CSF  of  adult  human.  Contrast  sensitivity  is  defined  as  the 

inverse  of  contrast  threshold,  which  is  the  minimum  contrast  level  to 
see  the  grating  in  the  test  image  [From  McCourt,  2001]. 


1.  II  Imagery 

Figure  6  is  a  typical  II  image  obtained  by  a  NVD  or  NVG.  As  discussed  in 
the  previous  section,  the  low  luminance,  coupled  with  low  reflectivity  from  the 
ground  and  foliage,  generates  a  low-contrast  image  with  limited  dynamic  contrast 
range.  Detector  noise  and  clutter  from  the  background  degrades  the  image 
further.  Figure  6  also  shows  a  lack  of  details  and  contrast  in  the  ground  before 
the  treeline,  which  are  essential  for  situational  awareness  and  navigation. 
Flowever,  the  upper  portion  of  the  image  has  better  contrast  due  to  illumination 
by  the  night  sky  (from  moon  and  stars).  In  this  more  illuminated  region,  the 
foliage  can  be  differentiated,  as  the  objects  would  be  within  the  CSF  for 
detection. 
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« 

Figure  6;  A  NVD  image  [From  Naval  Research  Laboratory  (NRL)] 


Figure  7:  A  Tl  image  [From  NRL], 
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2.  Tl  Imagery 

Figure  7  is  a  FLIR  or  Tl  image  of  the  same  scene  as  Figure  6.  The 
temperature  difference  between  the  regions  (due  to  different  cooling  rates  of  the 
earth  or  soil)  generates  sufficient  contrast  to  see  the  layout  of  the  ground  before 
the  treeline.  The  warm  air  and  the  low  emissivity  of  the  trees  also  creates  the 
sharp  contrast  cues  of  the  treeline  against  the  sky  (the  treeline  appears  darker). 
However,  for  areas  of  homogeneity  in  temperature  or  emissivity  (such  as  the 
foliage  of  individual  trees),  there  is  a  lack  of  contrast  or  surface  information,  as 
evident  by  the  “hollow  appearance”  of  the  foliage.  Note  that  the  Ms  do  not  have 
this  problem,  as  they  detect  the  reflected  radiation  from  the  surface  of  the 
objects.  Hence,  the  information  contained  in  the  II  and  Tl  images  is 
complementary  since  the  sensors  operate  in  different  bands  of  the 
electromagnetic  spectrum.  This  leads  to  the  impetus  for  image  or  sensor  fusion 
to  improve  image  quality  and  content  [Scrofani,  1997]. 


3.  Comparison  of  Tl  and  II  Imagery 

In  a  military  context,  the  object  of  interest  tends  to  be  either  man-made  or 
alive.  Such  objects  will  have  a  temperature  above  zero  Kelvin,  due  to  body  heat 
or  some  other  energy  generating  process.  Without  solar  heating,  the  air  and  the 
earth  cool  down  during  the  night.  Hence,  all  these  objects  of  interest  will  contrast 
easily  against  the  background  and  stand  out  in  a  Tl,  unless  there  is  deliberate 
action  to  reduce  the  temperature  contrast  (such  as  camouflaging  or  shielding).  In 
comparison,  II  depends  greatly  on  ambient  light  (artificial  or  natural)  for  visibility, 
as  it  amplifies  reflected  incoming  light.  Therefore,  in  a  totally  dark  room,  the  II  will 
not  be  able  to  generate  any  image  at  all,  whilst  the  Tl  is  still  able  to  “see”, 
provided  that  there  are  temperature  gradients  present.  The  Tl  also  has  better 
ability  to  see  through  smoke,  rain  and  snow,  as  the  longer  wavelength  IR 
radiation  is  able  to  propagate  in  the  presence  of  such  atmospheric  particles  with 
minimal  attenuation,  unlike  shorter  visible  and  near-IR  radiation  which  would  be 
scattered.  As  a  result,  the  detection  range  for  Tl  tends  to  be  greater  than  II. 
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As  II  intensifies  and  amplifies  incoming  light,  there  is  a  possibility  of 
“overloading”  the  II  detector  by  a  bright  or  high  luminance  source,  which  could 
temporarily  “black-out”  the  sensor,  similar  to  human  vision  when  stepping  out 
from  a  dark  room  into  bright  sunlight.  The  II  is  designed  to  “see”  at  night  where 
the  luminance  level  is  low  (0.1  lux  or  lower).  Hence,  a  source  with  an  intensity 
level  a  couple  of  orders  of  magnitude  higher  is  sufficient  to  overload  the  II 
baseline  sensitivity  (a  handheld  flashlight  is  capable  of  producing  100  lux  or 
more).  Although  the  MCP  amplifier  generally  has  a  non-linear  response  which 
reduces  gain  response  at  high  irradiance,  it  is  still  insufficient  to  isolate  bright 
sources  and  avoid  such  saturation.  Figure  8  is  a  representation  of  this  “over¬ 
exposure”  pitfall  of  the  II  by  a  light  source. 


Figure  8:  An  II  image  degraded  by  over-exposure  [From  NRL]. 
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Figure  9;  A  Tl  image  of  the  same  scene  as  Figure  8,  displaying 
better  contrast  and  details  level  than  the  II  image  [From  NRL], 

Given  the  tactical  advantages  of  Tl  and  the  shortcomings  of  II,  there  is 
therefore  a  general  preference  for  Tl  as  the  night  vision  sensor  of  choice  for 
detection.  However,  II  still  has  a  slight  advantage  in  identification,  because  of  its 
ability  to  sense  surface  differences  from  their  reflectivity.  The  relatively  lower  cost 
and  compactness  of  II  systems  make  them  attractive  for  field  deployment,  as 
unlike  the  Tl  systems,  they  do  not  require  a  cooling  system  for  better  sensitivity. 

In  general,  due  to  limited  reflectivity  characteristics  from  the  scene,  the 
quality  of  II  images  is  hampered  by  lower  contrast.  It  is  difficult  to  discriminate 
objects  from  the  background  and  clutter.  From  the  previous  section,  increasing 
the  contrast  increases  the  visible  stimulus  and  the  probability  of  detection,  as 
demonstrated  by  the  Contrast  Sensitivity  Function  (CSF)  in  Figure  5.  Therefore, 
the  usability  of  II  system  for  detection  will  be  enhanced  if  the  contrast  of  the  II 
images  can  be  improved  or  the  dynamic  range  expanded,  without  altering  the 
spatial  content  of  the  original  image. 
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D. 


OBJECTIVE 


Image  enhancement  techniques  to  improve  visual  quality  have  been 
popularized  with  the  proliferation  of  digital  imagery  and  computers.  Techniques 
range  from  noise  filtering,  edge  enhancement,  color  balance  and  contrast 
enhancement,  in  both  frequency  and  spatial  domains.  Even  in  word  processor 
software  such  as  Microsoft  Word,  there  are  features  or  tool  options  to  manipulate 
contrast  and  brightness  levels  of  images. 

Computer-aided  operation  is  also  becoming  a  necessity,  even  in  the 
military.  Advanced  systems  and  arms  modernization  programs  often  involve  the 
integration  of  a  computer  or  a  computer  processing  interface  to  reduce  the 
combat  loading  on  the  soldier  or  improve  system  reaction  time.  One  prime 
example  is  the  Land  Warrior  program  [FAS  website,  2003],  where 
communications,  sensors,  and  materials  are  integrated  into  a  complete  soldier 
system.  At  the  heart  of  this  soldier  system,  is  a  computer  module  or  subsystem 
which  integrates  all  the  information  and  sensors  together  before  presenting  to  the 
soldier  via  a  helmet  mounted  display.  The  electro-optical  sensors  include  thermal 
weapon  sight,  image  intensifier,  video  camera  (visible)  and  laser  range-finder. 
Electro-optical  sensors  are  also  generally  transitioning  from  direct  view  to  remote 
display,  which  provides  a  possibility  for  enhancement. 

Taking  the  two  developments  in  stride,  it  is  therefore  feasible  to  digitally 
enhance  the  night  vision  images  with  a  computer  algorithm  before  presenting  it  to 
the  user,  particularly  a  military  one.  Images  acquired  from  the  night  vision  device 
can  be  easily  digitized  by  coupling  the  sensor  output  screen  to  a  scanning  array 
or  an  Analog-to-Digital  converter.  Next,  the  digital  image  can  undergo  a  contrast 
enhancement  algorithm,  such  as  the  Contrast-Limited  Adaptive  Histogram 
Equalization  (CLAHE)  to  improve  its  visible  scene  content,  while  maintaining  the 
spatial  relation  of  the  original  image,  before  displaying  the  final  improved  image 
to  the  human  user. 

II  systems  and  images  would  benefit  most  from  such  a  contrast 
enhancement  because  of  their  inherent  low  contrast  limitation.  The  II  system 
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would  be  given  a  new  life  and  a  new  “light”  per  se,  when  the  quality  of  the  II 
images  can  be  improved  significantly  by  the  proposed  algorithm.  Furthermore,  no 
major  modification  is  required  on  the  II  system  since  the  enhancement  is  done  by 
a  software  algorithm. 

This  thesis  explores  the  effect  of  such  an  image  enhancement  algorithm 
on  the  night  vision  image.  Chapter  II  briefly  reviews  the  fundamentals  of  digital 
image  processing  and  the  CLAHE  process,  while  Chapter  III  analyses  the 
enhancement  results  obtained  with  the  CLAHE  process.  Finally,  Chapter  IV 
presents  the  conclusions  and  recommendations  for  further  research. 
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II.  DIGITAL  IMAGE  PROCESSING 


A.  DIGITAL  IMAGE 

A  digital  image  is  essentially  a  two-dimensional  array  of  light-intensity 
levels,  which  can  be  denoted  by  f(x,y),  where  the  value  or  amplitude  of  f  at 
spatial  coordinates  (x,y)  gives  the  intensity  of  the  image  at  the  point.  The 
intensity  is  a  measure  of  the  relative  “brightness”  of  each  point.  The  brightness 
level  is  represented  by  a  series  of  discrete  intensity  shades  from  darkest  to 
brightest,  for  a  monochrome  (single  color)  digital  image.  These  discrete  intensity 
shades  are  usually  referred  to  as  the  “gray  levels”,  with  black  representing  the 
darkest  level  and  white,  the  brightest  level.  These  levels  will  be  encoded  in  terms 
of  binary  bits  in  the  digital  domain,  and  the  most  commonly  used  encoding 
scheme  is  the  8-bit  display  with  256  levels  of  brightness  or  intensity,  starting  from 
level  0  (black)  to  255  (white).  The  digital  image  can  therefore  be  conveniently 
represented  and  manipulated  as  an  N  (number  of  rows)  x  M  (number  of  columns) 
matrix,  with  each  element  containing  a  value  between  0  and  255  (for  an  8-bit 
monochrome  image),  i.e. 


f(x,y) 


■  f(0,0) 

f(1,0)  . 

f(0,M-1) 

f(1,0) 

f(1,1)  . 

f(1,M-1) 

f(N-1,0) 

f(N-1,1)  . 

.  f(N-1,M-1) 

,  where  0  <  f(x,y)  <  255. 


Different  colors  are  created  by  mixing  different  proportions  of  the  3  primary 
colors:  red,  green  and  blue,  i.e.  RGB  for  short.  Hence,  a  color  image  is 
represented  by  an  N  x  M  x  3  three-dimensional  matrix,  with  each  layer 
representing  the  gray-level  distribution  of  one  primary  color  in  the  image. 

Each  point  in  the  image  denoted  by  the  (x,y)  coordinates  is  referred  to  as 
a  pixel.  The  pixel  is  the  smallest  cell  of  information  in  the  image.  It  contains  a 
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value  of  the  intensity  level  corresponding  to  the  detected  irradiance.  Therefore, 
the  pixel  size  defines  the  resolution  and  acuity  of  the  image  seen.  Each  individual 
detector  in  the  sensor  array  and  each  dot  on  the  LCD  (liquid  crystal  display) 
screen  contributes  to  generate  one  pixel  of  the  image.  There  is  actually  a 
physical  separation  distance  between  pixels  due  to  finite  manufacturing 
tolerance.  However,  these  separations  are  not  detectable,  as  the  human  eye  is 
unable  to  resolve  such  small  details  at  normal  viewing  distance  (refer  to 
Rayleigh’s  criterion  for  resolution  of  diffraction-limited  images  [Pedrotti,  1993]). 
For  simplicity,  digital  images  are  represented  by  an  array  of  square  pixels. 

The  relation  between  pixels  constitutes  the  information  contained  in  an 
image.  A  pixel  at  coordinates  (x,y)  has  eight  immediate  neighbors  which  are  a 
unit  distance  away: 


(x-1,  y-1) 

(x-1,  y) 

(x-1,  y+1) 

(X,  y-1) 

(x,y) 

(X,  y+1) 

(x+1,  y-1) 

(x+1,  y), 

(x+1,  y+1) 

Figure  10:  Neighbors  of  a  Pixel.  Note  the  direction  of  the  x  and  y 
coordinates  used. 


Pixels  can  be  connected  to  form  boundaries  of  objects  or  components  of 
regions  in  an  image  when  the  gray  levels  of  adjacent  pixels  satisfy  a  specified 
criterion  of  similarity  (equal  or  within  a  small  difference).  The  difference  in  the 
gray  levels  of  two  adjacent  pixels  gives  the  contrast  needed  to  differentiate 
between  regions  or  objects.  This  difference  has  to  be  of  a  certain  magnitude  in 
order  for  the  human  eye  to  identify  it  as  a  boundary. 
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B.  IMAGE  PROCESSING  METHODS 


There  are  two  main  methods  to  process  an  image  as  defined  by  the 
domain  in  which  the  image  is  processed,  namely  the  spatial  domain  or  the 
frequency  domain.  The  spatial  domain  refers  to  the  image  plane  itself,  and 
approaches  in  this  category  are  based  on  direct  manipulation  of  pixels  in  an 
image.  Frequency  domain  processing  techniques  are  based  on  modifying  the 
spatial  frequency  spectrum  of  the  image  as  obtained  by  the  Fourier  transform. 
Enhancement  techniques  based  on  various  combinations  of  methods  from  these 
two  categories  are  not  unusual  and  the  same  enhancement  technique  can  also 
be  implemented  in  both  domains,  yielding  identical  results  [Gonzalez  and  Woods, 
1993]. 


1.  Spatial  Domain  Methods 

The  spatial  domain  refers  to  the  aggregate  of  pixels  composing  an  image, 
and  spatial  domain  methods  are  procedures  that  operate  directly  on  these  pixels. 
Image  processing  functions  in  the  spatial  domain  may  be  expressed  as: 

9(x,y)  =  T[f(x,y)],  (1) 

where  f(x,y)  is  the  input  image  data,  g(x,y)  is  the  processed  image  data, 
and  7  is  an  operator  on  f,  defined  over  some  neighborhood  of  (x,y).  In  addition,  7 
can  also  operate  on  a  set  of  input  images,  for  example  performing  the  pixel-by- 
pixel  sum  and  averaging  a  number  of  images  for  noise  reduction. 

The  principal  approach  to  defining  a  neighborhood  about  (x,y)  is  to  use  a 
square  or  rectangular  mask  centered  at  (x,y).  The  center  of  this  mask  or  window 
is  moved  from  pixel  to  pixel,  and  the  operator  applied  at  each  location  (x,y)  to 
yield  the  corresponding  g  for  that  location.  The  resultant  g(x,y)  is  stored 
separately,  instead  of  changing  pixel  values  in  place,  to  avoid  a  “snow-balling” 
effect  of  the  altered  gray  levels. 
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2.  Frequency  Domain  Methods 

The  foundation  of  frequency  domain  techniques  is  the  convolution 
theorem.  The  processed  image,  g(x,y),  is  formed  by  the  convolution  of  an  image 
f(x,y)  and  a  linear,  position-invariant  operation  h(x,y),  that  is 

g(x,y)  =  h(x,y)*f(x,y).  (2) 

By  the  convolution  theorem,  the  following  frequency  domain  relation  holds: 

G(u,v)  =  H(u,v)  F(u,v),  (3) 

where  G,  H,  and  F  are  the  Fourier  transforms  of  g,  h  and  f  respectively. 

H(u,v)  is  called  the  transfer  function  of  the  process.  In  a  typical  image 
enhancement  application,  f(x,y)  is  given  and  the  goal,  after  computing  F(u,v),  is 
to  select  a  H(u,v)  so  that  the  desired  image  g(x,y)  exhibits  some  highlighted 
feature  of  f(x,y),  i.e. 

g(x,y)  =  F’ [ H(u,v)  F(u,v)  ].  (4) 

For  instance,  edges  in  f(x,y)  can  be  accentuated  by  using  a  function  H(u,v) 

that  emphasizes  the  high-frequency  components  of  F(u,v). 

3.  Giobai  and  Locai  Methods 

Image  processing  methods  that  involve  using  a  single  transformation 
function  for  the  whole  image  are  classified  as  global  methods  or  algorithms.  The 
lowpass/highpass  filters  and  histogram  transformation  are  examples  of  global 
enhancement  methods.  The  main  advantage  of  global  methods  is  that  they  are 
computationally  inexpensive  and  simple  to  implement.  However,  global  methods 
may  attenuate  or  miss  local  information  while  working  on  the  overall 
characteristic  of  the  image. 

The  transformation  function  of  a  local  processing  method  is  dependent  on 
the  location  and  the  neighborhood  of  the  pixel  looked  at,  i.e. 

9(x^y)  =  T[x,y,  f(x,y)].  (5) 
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These  methods  are  therefore  “adaptive”  to  the  local  information  within  the 
image.  Adaptive  histogram  equalization  is  an  example  of  such  a  local  processing 
method  and  is  effective  in  enhancing  details  in  local  areas  of  the  image. 
However,  pixels  of  the  same  gray  level  in  the  original  image  may  be  mapped  to 
different  gray  levels  in  the  output  image,  due  to  the  various  “localized”  mapping 
or  transformation  functions,  which  could  artificially  alter  the  appearance  of  the 
original  image.  Abrupt  changes  or  boundaries  may  also  result  in  the  image, 
because  each  transformation  is  done  locally  and  independently. 

C.  FILTERS 

Filtering  refers  to  the  selective  processing  of  an  image  to  remove 
unwanted  aspects  of  the  image  or  to  transform  only  certain  portions  of  the  image. 
Lowpass  filters  attenuate  or  eliminate  high-frequency  components  in  the  Fourier 
domain,  while  allowing  low  frequencies  to  pass  through  untouched.  As  the  high 
frequency  components  characterize  edges  and  other  sharp  details  in  an  image, 
the  net  effect  of  lowpass  filtering  is  image  blurring  [Gonzalez  and  Woods,  1993]. 
Hence,  lowpass  filters  are  also  known  as  smoothing  filters  and  are  commonly 
used  for  noise  reduction. 

Similarly,  highpass  filters  attenuate  low-frequency  components.  Because 
these  components  are  responsible  for  the  slowly  varying  characteristics  of  an 
image,  such  as  overall  contrast  and  average  intensity,  the  net  result  of  highpass 
filtering  is  a  reduction  of  these  features  and  a  corresponding  apparent 
sharpening  of  edges  and  other  sharp  details.  Highpass  filters  are  therefore 
known  also  as  sharpening  filters. 

1.  Lowpass  Filtering 

As  indicated  earlier,  edges  and  other  sharp  transitions  (such  as  noise)  in 
the  gray  levels  of  an  image  contribute  significantly  to  the  high-frequency  content 
of  its  Fourier  transform.  Hence,  blurring  or  smoothing  is  achieved  in  the 
frequency  domain  by  attenuating  a  specified  range  of  high-frequency 

components  in  the  transform  of  a  given  image. 
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A  2-D  ideal  lowpass  filter  is  one  whose  transfer  function  in  equation  (4) 
satisfies  the  relation; 


H(u,v) 


i1  ifD(u,v)  <  D, 
[0  ifD(u,v)>D,’ 


(6) 


where  Do  is  a  specified  non-negative  quantity  and  D(u,v)  is  the  distance 
from  point  (u,v)  to  the  origin  of  the  frequency  plane,  i.e. 

D(u,v)  =  (u^  +  (7) 

The  point  of  transition  between  H(u,v)  =  1  and  H(u,v)  =  0,  Do,  is  called  the 
cutoff  frequency.  One  way  to  establish  this  cutoff  frequency  is  to  define  the 
percent  of  signal  power  to  be  contained  within  or  passed  by  the  filter.  Do  is  then 
equivalent  to  the  radius  of  a  circle  with  origin  at  the  center  of  a  2-dimensional 
frequency  plot.  For  an  ideal  filter,  this  transition  is  an  impulse  step,  i.e. 
frequencies  equal  to  or  less  than  Do  are  passed  with  no  attenuation,  while 
frequencies  higher  than  Do  are  completely  attenuated.  However,  this  sharp  cutoff 
frequency  cannot  be  realized  with  electronic  components. 

The  Butterworth  lowpass  filter  was  formulated  to  address  this  practical 
limitation,  as  it  does  not  have  a  sharp  discontinuity  between  passed  and  filtered 
frequencies.  The  Butterworth  transfer  function  (of  order  n)  is  defined  as  follows 
[Gonzalez  and  Woods,  1993]: 


H(u,v) 


1 

1  +  [D(u,v)/D,f^  ■ 


(8) 


Lowpass  smoothing  fliters  can  also  be  implemented  in  the  spatial  domain. 
Figure  1 1  shows  a  general  3x3  linear  mask  with  arbitrary  coefficients  (weights)  z. 
Denoting  the  gray  levels  of  pixels  under  the  mask  at  any  location  by  zi,  Z2...  zg, 
the  response  of  the  mask  is: 

R  =  wiZi  +  W2Z2  +  ...  +  W9Z9.  (9) 
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Figure  11:  A  3x3  spatial  mask  with  arbitrary  coefficients  [From 

Gonzalez  and  Woods,  1993]. 

All  the  coefficients  of  the  mask  are  set  to  a  value  of  1  for  simple  smoothing 
processing.  The  response  from  the  mask  would  be  the  sum  of  gray  levels  for  the 
nine  pixels  under  the  mask,  as  per  equation  (8).  This  response  R  is  then  scaled 
down  by  dividing  by  the  total  number  of  pixels  (nine  in  this  case)  to  keep  within 
the  original  gray  levels  range.  Therefore,  the  response  or  result  would  simply  be 
the  average  of  all  the  pixels  in  the  area  of  the  mask.  Larger  masks  (e.g.  5x5  and 
7x7)  follow  the  same  concept  and  will  blur  the  image  further  with  larger 
neighborhood  averaging.  For  the  border  pixels  of  the  image,  there  will  be  a 
shortage  of  neighborhood  pixels  for  the  mask.  One  option  is  to  pad  the  shortage 
with  pixels  of  the  same  values  as  the  center  pixel  or  a  reference  pixel.  Another 
option  is  to  process  one  layer  less  of  pixels,  i.e.  no  filtering  on  the  border  pixels. 

Lowpass  filters  are  generally  used  for  blurring  and  for  noise  reduction  in 
preprocessing  steps,  such  as  the  removal  of  small  details  from  an  image  prior  to 
object  extraction,  and  bridging  of  small  gaps  in  lines  or  curves.  Figure  12 
illustrates  the  effect  of  a  lowpass  filter. 
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Figure  12:  Lowpass  filtering  with  a  3x3  spatial  filter  or  98%  percent 
power  Do  locus.  The  top  image  is  the  original  image  and  the  bottom  the 
processed  image,  where  the  image  details  have  been  blurred. 
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2.  Highpass  Filtering 

Image  sharpening  can  be  achieved  in  the  frequency  domain  by  a  highpass 
filtering  process  as  edges  and  other  abrupt  changes  in  gray  levels  are  associated 
with  high-frequency  components.  Such  filtering  attenuates  the  low-frequency 
components  without  disturbing  high-frequency  information  in  the  Fourier 
transform.  Highpass  fliters  are  therefore  known  also  as  sharpening  fliters. 

The  highpass  filtering  process  can  be  implemented  in  both  the  frequency 
and  spatial  domains.  For  highpass  filtering  in  the  frequency  domain,  the  transfer 
function  is  essentially  the  inverse  of  that  obtained  for  lowpass  filtering, 


H(u,v) 


jo  ifD(u,v)  <  D, 
\l  ifD(u,v)>D,- 


(10) 


The  transfer  function  of  the  Butterworth  highpass  filter  of  order  n  and  with 
cutoff  frequency  locus  at  distance  Do  from  the  origin  is  defined  by  the  relation 


H(u,v) 


1 

1  +  [D,/D(u,v)f^  ■ 


(11) 


The  principal  objective  of  sharpening  is  to  highlight  fine  detail  in  an  image 
or  to  enhance  detail  that  has  been  blurred,  either  in  error  or  as  a  natural  effect  of 
a  particular  method  of  image  acquisition.  Uses  of  image  sharpening  vary  and 
include  applications  ranging  from  electronic  printing  to  medical  imaging  to 
industrial  inspection  and  autonomous  object  detection. 

A  basic  3x3  highpass  spatial  mask  is  shown  in  Figure  13.  The  center 
coefficient  is  positive  while  the  rest  of  the  mask  contains  negative  coefficients. 
The  sum  of  the  coefficients  is  then  equal  to  zero.  Thus,  the  output  of  the  mask  is 
zero  or  very  small  when  the  mask  is  over  an  area  of  constant  or  slowly  varying 
gray  level.  As  with  highpass  frequency  filtering,  the  zero-frequency  term  is 
attenuated  or  eliminated.  This  will  reduce  the  average  gray-level  value  in  the 
image  to  zero,  which  in  turn  reduces  the  global  contrast  of  the  image.  The 
expected  result  from  such  a  highpass  mask  is  therefore  characterized  by 
highlighted  edges  over  a  dark  background.  Reducing  the  average  value  of  an 
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image  to  zero  also  implies  that  the  image  may  have  negative  gray  levels  due  to 
the  negative  coefficients  in  the  mask.  Next,  the  results  have  to  be  adjusted  or 
clipped  and  scaled  down  (by  dividing  by  the  number  of  pixels  in  the  mask)  to 
keep  the  output  within  the  original  (non-negative)  gray  level  range. 
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Figure  13;  A  basic  highpass  spatial  filter  [From  Gonzalez  and 

Woods,  1993]. 

A  highpass  filtered  image  can  be  computed  as  the  difference  between  the 
original  image  and  a  lowpass  filtered  version  of  the  same  image,  as  the  highpass 
filter  is  the  complement  of  the  lowpass,  i.e., 

Highpass  =  Original  -  Lowpass.  (12) 

Multiplying  the  original  image  by  an  amplification  factor,  denoted  by  A, 
yields  the  definition  of  a  high-boost  or  high-frequency-emphasis  filter,  i.e., 

Highboost  =  (A)(Original)  -  Lowpass, 

=  (A-1) (Original)  +  Original  -  Lowpass, 

=  (A-1)(Original)  +  Highpass.  (13) 

When  A  >1 ,  part  of  the  original  is  added  back  to  the  highpass  result,  which 
restores  partially  the  low-frequency  components  lost  in  the  highpass  filtering 
operation.  The  result  is  that  the  high-boost  image  looks  more  like  the  original 
image,  with  a  relative  degree  of  edge  enhancement  that  depends  on  the  value  of 
A.  Therefore,  the  center  weight  of  the  high-boost  filter  can  be  represented  by 

W5  =  9A-1  with  a  >1.  (14) 

When  A  =  ^,  the  basic  highpass  filter  is  obtained  as  in  Figure  13 
[Gonzalez  and  Woods,  1993]. 
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Figure  14:  High-boost  filtering  with  >A  =  1 .8.  The  bottom  image  is  the 
processed  image.  The  brightness  of  the  image  is  lowered  and  the 
features  of  the  ships  sharpened. 
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D.  HISTOGRAM 

An  image  histogram  is  a  plot  of  the  distribution  of  intensities  or  gray  levels 
in  an  image.  The  histogram  of  a  digital  image  with  gray  levels  in  the  range  [0,  L- 
1]  can  be  represented  by  the  discrete  function 

P(r,)  =  ^.  (15) 

n 

where  rk  is  the  gray  level,  nk  is  the  number  of  pixels  in  the  image  with 
that  gray  level,  n  is  the  total  number  of  pixels  in  the  mage,  and  /f  =  0,  1 , 2,  . . .  /.-1 . 


Figure  15:  Histograms  of  four  basic  image  types  [After  Gonzalez 
and  Woods,  1993]. 
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The  image  histogram  gives  an  estimate  of  the  probability  of  occurrence  of 
a  gray  level  rk.  A  plot  of  this  function  for  all  values  of  k  also  provides  a  global 
description  of  the  appearance  of  an  image.  For  example,  Figure  15  shows  the 
histograms  of  four  basic  types  of  images.  The  histogram  in  Figure  15(a)  shows 
that  the  gray  levels  are  concentrated  toward  the  dark  end  of  the  gray  scale 
range.  Thus,  this  histogram  corresponds  to  an  image  with  overall  dark 
characteristics.  Figure  15(b)  is  the  opposite,  with  a  bright  image.  The  histogram 
shown  in  Figure  15(c)  has  a  narrow  shape,  which  indicates  little  dynamic  range 
and  thus  corresponds  to  an  image  having  low  contrast,  while  Figure  15(d)  shows 
a  histogram  with  significant  spread,  corresponding  to  an  image  with  high 
contrast. 

Although  the  histogram  does  not  provide  any  specific  information  about 
the  image  content,  the  shape  and  distribution  of  the  histogram  provide  a  venue 
for  contrast  enhancement.  However,  the  histogram  is  a  global  representation  of 
the  intensity  characteristics  within  an  image  and  therefore,  histogram 
transformation  affects  the  whole  image,  i.e.  globally.  This  differs  from  the 
localized  methods  such  as  the  spatial  mask  and  filters,  which  depend  only  on  the 
pixel  looked  at  and  its  neighbors. 

E.  HISTOGRAM  EQUALIZATION 

The  histogram  of  an  image  represents  the  relative  frequency  of 
occurrence  of  gray  levels  within  an  image.  It  also  represents  the  probability  of 
such  an  occurrence.  With  a  narrow  distribution  of  gray  levels  (refer  to  Figure 
15(c)),  the  contrast  in  the  image  will  be  low  and  the  dynamic  range  limited. 
Hence,  a  good  gray  level  assignment  scheme  would  be  to  expand  the  intensity 
range  to  fill  the  whole  dynamic  range  available.  The  probability  of  occurrence  of 
all  gray  levels  should  be  equal  or  uniform.  In  histogram  equalization,  the  goal  is 
to  obtain  a  uniform  histogram  distribution  for  the  output  image,  so  that  an  optimal 
overall  contrast  is  perceived. 
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An  outline  of  the  histogram  equalization  process  is  as  follows  [Gonzalez 
and  Woods,  1993]: 

Let  the  variable  r  represent  the  gray  levels  in  the  image  to  be  enhanced  or 
equalized.  The  pixel  values  can  be  normalized  to  form  continuous  quantities  in 
the  interval  [0,  1],  with  r  =  0  representing  black  and  r  =  1  representing  white. 

For  any  r  in  the  interval  [0,  1],  the  transformation  is  of  the  form; 

s=T(r),  (16) 

which  produces  a  gray  level  s  for  every  level  of  r  in  the  original  image.  It  is 
assumed  that  the  transformation  function  given  in  equation  (15)  satisfies  the 
conditions:  (a)  T(r)  is  single-valued  and  monotonically  increasing  in  the  interval 
0<  r<  1;  and  (b)  0  <  T(r)  <  1  for  0  <  r<  1.  Condition  (a)  preserves  the  order  from 
black  to  white  in  the  gray  scale,  whereas  condition  (b)  guarantees  a  mapping  that 
is  consistent  with  the  allowed  range  of  gray  levels. 

The  inverse  transformation  from  s  back  to  r  is  then  denoted 

r  =  T-Us),  0<s<1,  (17) 

where  the  assumption  is  that  T~''(s)  also  satisfies  conditions  (a)  and  (b) 
with  respect  to  the  variable  s. 

The  gray  levels  in  an  image  may  be  viewed  as  random  quantities  in  the 
interval  [0,  1].  If  they  are  continuous  variables,  both  the  original  and  transformed 
gray  levels  can  be  characterized  by  their  probability  density  function  pr(r)  and 
Ps(s)  respectively,  where  the  subscripts  on  p  are  used  to  indicate  that  pr  and  Ps 
are  different  functions. 

The  probability  density  function  of  the  transformed  gray  levels  can 
therefore  be  expressed  by: 

(18) 

r=T-Us) 


Ps(S) 


Pr  r  hr 
ds 
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Consider  the  transformation  function 


s  =  T(r)  =  ^  p^(w)dw ,  0<r<^,  (19) 

0 

where  w  is  a  dummy  variable  of  integration.  Equation  (19)  is  actually  the 
cumulative  distribution  function  (CDF)  of  r.  Conditions  (a)  and  (b)  presented 
earlier  are  satisfied  by  this  transformation  function,  because  the  CDF  increases 
monotonically  from  0  to  1  as  a  function  of  r. 

From  equation  (19),  the  derivative  of  s  with  respect  to  r  is 

ds 

^  =  Pr(r).  (20) 

dr 

Substituting  equation  (20)  into  equation  (18)  yields 

p,(s)=  p,(r)-!—  =1,  0<s<1,  (21) 

L  PriOLr-.,., 

which  gives  a  uniform  density  in  the  interval  of  the  transformed  variable  s. 
This  result  is  independent  of  the  inverse  transformation  function.  Thus,  using  the 
cumulative  distribution  function  of  r  as  the  transformation  function  produces  an 
image  with  uniform  density  gray  levels  and  with  better  contrast  distribution. 

For  discrete  formulation,  the  probabilities  are  replaced  by: 

p(rj='^  0<r/(<1  and/c=0,  1  ... /.-I,  (22) 

n 

and  equation  (19)  will  be  given  by  the  relation 

(23) 

j=o  ri  j=o 

A  MATLAB  implementation  for  the  histogram  equalization  is  available  in 
Appendix  A. 
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Figure  16;  Result  of  histogram  equalization.  The  bottom  image  is 
the  processed  output. 
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Figure  17:  Image  histograms  before  and  after  equalization. 

Figure  16  and  17  show  the  histogram  equalization  results  and 
corresponding  histograms.  The  improvement  over  the  original  image  is  quite 
evident,  as  the  treeline  and  foliages  are  now  much  more  clearly  defined.  Looking 
at  the  histogram  plots,  the  gray  levels  of  the  equalized  image  are  spread  out. 


31 


resulting  in  an  increase  in  the  dynamic  range  of  gray  levels  and  hence  overall, 
contrast  of  the  image. 

Histogram  equalization  significantly  improves  the  visual  appearance  of  the 
image.  Similar  enhancement  results  could  have  been  achieved  by  using  a 
contrast  stretching  approach,  but  the  main  advantage  of  histogram  equalization 
over  manual  contrast  stretching  or  manipulation  techniques  is  that  the  former  is 
fully  automatic,  without  the  need  to  select  any  setting  or  to  adapt  to  the  original 
histogram  distribution  of  the  image. 

F.  ADAPTIVE  HISTOGRAM  EQUALIZATION 

In  low  contrast  images,  the  features  of  interest  may  occupy  only  a 
relatively  narrow  range  of  gray  scale,  with  the  majority  of  gray  levels  occupied  by 
“uninteresting  areas”  such  as  background  and  noise.  These  “uninteresting  areas” 
may  also  generate  large  counts  of  pixels  and  hence,  large  peaks  in  the 
histogram.  In  this  case,  the  global  histogram  equalization  amplifies  the  image 
noise  and  increases  visual  graininess  or  patchiness.  The  global  histogram 
equalization  technique  does  not  adapt  to  local  contrast  requirements,  and  minor 
contrast  differences  can  be  entirely  missed  when  the  number  of  pixels  falling  in  a 
particular  gray  range  is  small. 

Adaptive  Histogram  Equalization  (AHE)  is  a  modified  histogram 
equalization  procedure  that  optimizes  contrast  enhancement  based  on  local 
image  data.  The  basic  idea  behind  the  scheme  is  to  divide  the  image  into  a  grid 
of  rectangular  contextual  regions,  and  to  apply  a  standard  histogram  equalization 
in  each.  The  optimal  number  of  contextual  regions  and  the  size  of  the  regions 
depend  on  the  type  of  input  image,  and  the  most  commonly  used  region  size  is 
8x8  (pixels).  In  addition,  a  bi-linear  interpolation  scheme  is  used  to  avoid 
discontinuity  issues  at  the  region  boundaries. 

Figure  18  illustrates  the  application  of  the  interpolation  scheme  at  the 
boundaries.  Gray  level  assignment  at  the  sample  positions  indicated  by  the  white 
dot  are  derived  from  gray-value  distributions  in  the  surrounding  contextual 
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regions.  The  points  A,  B,  C,  and  D  are  the  centers  of  the  surrounding  contextual 
regions;  region-specific  gray  level  mappings  {gA(s),  gB(s),  gc(s)  and  gofsj)  are 
based  on  the  histogram  equalization  of  the  pixels  contained.  Thus,  assuming  that 
the  original  pixel  intensity  at  the  sample  point  is  s,  its  new  gray  value  s’  is 
calculated  by  bilinear  interpolation  of  the  gray-level  mappings  that  were 
calculated  for  each  of  the  surrounding  contextual  regions: 

s’  =  ( 1-y)((  1-x)gA(s)  +  xgB(s))+y((  1-x)gc(s)  +  xgD(s)),  (24) 

where  x  and  y  are  normalized  distances  with  respect  to  the  point  A.  This 
gray  level  interpolation  is  repeated  over  the  entire  image  [Zuiderveld,  1994]. 
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Figure  18;  Bilinear  interpolation  to  eliminate  region  boundaries 
[From  Zuiderveld,  1994]. 


AHE  is  able  to  overcome  the  limitations  of  the  standard  equalization 
method  as  discussed  earlier,  and  achieves  a  better  presentation  of  information 
present  in  the  image.  However,  AHE  is  unable  to  distinguish  between  noise  and 
features  in  the  local  contextual  regions.  Hence,  background  noise  is  amplified  in 
“flat”  or  “featureless”  regions  of  the  image,  which  is  a  major  drawback  of  the 
method. 
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G.  CONTRAST  LIMITED  ADAPTIVE  HISTOGRAM  EQUALIZATION 

The  noise  problem  associated  with  AHE  can  be  reduced  by  limiting 
contrast  enhancement  specifically  in  homogeneous  areas.  These  areas  can  be 
characterized  by  a  high  peak  in  the  histogram  associated  with  the  contextual 
regions  since  many  pixels  fall  inside  the  same  gray  level  range.  The  Contrast 
Limited  Adaptive  Histogram  Equalization  (CLAHE)  limits  the  slope  associated 
with  the  gray  level  assignment  scheme  to  prevent  saturation,  as  illustrated  in 
Figure  19.  This  process  is  accomplished  by  allowing  only  a  maximum  number  of 
pixels  in  each  of  the  bins  associated  with  the  local  histograms.  After  “clipping”  the 
histogram,  the  clipped  pixels  are  equally  redistributed  over  the  whole  histogram 
to  keep  the  total  histogram  count  identical.  The  CLAHE  process  is  summarized  in 
Table  1 . 


Figure  19;  Principle  of  contrast  limiting  as  used  in  CLAHE.  (a) 
Histogram  of  a  contextual  region  containing  many  background  pixels, 
(b)  Calculated  cumulative  histogram,  (c)  Clipped  histogram  with  excess 
pixels  redistributed  throughout  the  histogram,  (d)  Cumulative  clipped 
histogram  with  maximum  slope  set  to  the  clip  limit  [From  Zuiderveld, 
1994]. 


The  clip  limit  is  defined  as  a  multiple  of  the  average  histogram  contents 
and  is  actually  a  contrast  factor.  Setting  a  very  high  clip  limit  basically  limits  the 
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clipping  and  the  process  becomes  a  standard  AHE  technique.  A  clip  or  contrast 
factor  of  one  prohibits  any  contrast  enhancement,  preserving  the  original  image. 

Table  1 .  Summary  of  CLAHE  process  [Mathworks,  2003]. 

1.  Obtain  all  the  inputs: 

•  Image 

•  Number  of  regions  in  row  and  column  directions 

•  Number  of  bins  for  the  histograms  used  in  building  image 
transform  function  (dynamic  range) 

•  Clip  limit  for  contrast  limiting  (normalized  from  0  to  1 ) 

2.  Pre-process  the  inputs: 

•  Determine  real  clip  limit  from  the  normalized  value. 

•  If  necessary,  pad  the  image  (to  even  size)  before  splitting 
into  regions. 

3.  Process  each  contextual  region  (tile)  thus  producing  gray  level 
mappings: 

•  Extract  a  single  image  region. 

•  Make  a  histogram  for  this  region  using  the  specified  number 
of  bins. 

•  Clip  the  histogram  using  clip  limit. 

•  Create  a  mapping  (transformation  function)  for  this  region. 

4.  Interpolate  gray  level  mappings  in  order  to  assemble  final 
CLAHE  image: 

•  Extract  cluster  of  four  neighboring  mapping  functions. 

•  Process  image  region  partly  overlapping  each  of  the 
mapping  tiles. 

•  Extract  a  single  pixel,  apply  four  mappings  to  that  pixel,  and 
interpolate  between  the  results  to  obtain  the  output  pixel. 

•  Repeat  over  entire  image. 


The  CLAHE  process  and  command  can  be  found  in  the  Image  Processing 
Toolbox  (version  4.1)  of  MATLAB  (version  6.5,  release  13). 
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The  main  advantages  of  the  CLAHE  transform  are  its  modest 
computational  requirements,  ease  of  use  and  excellent  results  on  most  images. 
Figure  20  compares  the  CLAHE  result  to  that  obtained  by  the  standard  histogram 
equalization  method.  The  CLAHE  image  has  less  amplified  noise  and  avoids  the 
brightness  saturation  in  the  standard  histogram  equalization.  Additional 
comparison  samples  are  included  in  Appendix  B. 

CLAHE  does  have  its  limitations.  Since  the  method  is  aimed  at  optimizing 
contrast,  there  no  direct  1-to-1  relationship  between  the  gray  values  of  the 
original  image  and  the  CLAHE  processed  result.  Pixels  of  the  same  gray  level  in 
the  original  image  may  be  mapped  to  different  gray  levels  in  the  output  image, 
because  of  the  equalization  process  and  bilinear  interpolation.  Consequently, 
CLAHE  images  are  not  suited  for  quantitative  measurements  that  rely  on 
physical  meaning  of  image  intensity  [Zuiderveld,  1994]. 
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Figure  20:  Comparison  of  images  obtained  from  standard  histogram 
equalization  (top  image)  and  from  CLAHE  (bottom  image).  The  CLAHE 
image  has  less  amplified  noise  and  avoids  saturation  by  the  bright 
source  in  the  image.  Figure  8  contains  the  original  image. 
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III.  IMAGE  ENHANCEMENT  BY  CLAHE 


A.  SPATIAL  FREQUENCY 

An  image  can  be  expressed  in  both  the  spatial  and  the  frequency 
domains.  The  spatial  domain  is  simply  the  two-dimensional  image  space  which 
contains  an  array  of  pixels  with  intensity  values  representing  the  image.  The 
image  can  be  converted  from  the  spatial  domain  to  the  frequency  domain  by 
Fourier  transform. 

The  periodicity  with  which  the  image  intensity  values  change  is  commonly 
referred  to  as  the  spatial  frequency.  The  image  value  at  each  position  (fx,  fy)  in 
the  frequency  domain  represents  the  amount  by  which  the  intensity  values  in  the 
image  vary  over  a  specific  distance  related  to  the  spatial  frequencies  fx  and  fy  (for 
a  2-dimensional  image).  For  a  simple  image  that  is  totally  grey  in  color,  i.e.  one 
single  gray  value  in  all  pixels,  there  will  be  no  frequency  component  in  both  the  x- 
and  y-directions,  although  there  will  still  be  a  zero  frequency  component 
corresponding  to  the  single  gray  value  of  the  image,  or  in  other  words,  the  DC 
component  of  the  image.  If  there  is  a  change  in  intensity  or  gray  level  values, 
there  will  be  some  frequency  components  along  the  direction  of  change  in  the 
frequency  domain.  There  will  be  only  one  frequency  component  if  the  change  is 
purely  sinusoidal. 

For  example,  suppose  that  there  is  the  value  20  at  the  point  that 
represents  the  frequency  0.1  (or  1  period  every  10  pixels).  This  means  that  in  the 
corresponding  spatial  domain,  the  intensity  values  vary  from  dark  to  light  and 
back  to  dark  over  a  distance  of  10  pixels,  and  that  the  contrast  between  the 
lightest  and  darkest  is  40  gray  levels  (2  times  20). 

The  significance  and  correlation  of  the  spatial  frequency  to  the  image  is 
illustrated  in  Figure  21.  A  simple  square-in-square  image  is  generated  with 
different  degrees  of  contrast  against  the  background  as  shown.  For  the  first 
image,  the  background  is  set  at  a  gray  level  of  100  and  the  square  at  128,  while 
for  the  second  image;  the  background  is  set  at  0  (black)  and  the  square  at  the 
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same  level  of  220.  The  corresponding  spatial  frequency  spectra  are  plotted  and 
the  increased  higher  frequency  components  due  to  the  increased  contrast 
between  object  and  background  are  clearly  shown. 


Figure  21:  A  simple  image  with  its  corresponding  spatial  frequency 
spectrum  and  the  same  image  with  a  higher  contrast  between  object 
and  background,  showing  increased  higher  frequency  components. 


Hence,  a  high  spatial  frequency  therefore  represents  a  large  change  in 
intensity  or  contrast  over  short  image  distances.  This  can  be  translated  to  edges 
and  sharp  details  in  the  image.  The  larger  the  amplitude  or  the  frequency  power, 
the  greater  the  contrast  change.  The  zero  frequency  in  the  frequency  domain  will 
correspond  to  the  baseline  intensity  level  in  the  image  [HIPR,  2003]. 
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To  reinforce  this  point,  the  standard  test  image  “Lenna”  is  used  to  illustrate 
the  visual  effect  of  boosting  the  higher  spatial  frequencies.  The  original  gray¬ 
scale  “Lenna”  image  (512x512)  is  converted  to  the  frequency  domain  and 
components  beyond  the  150*^^  pixel  (arbitrary  chosen)  away  from  the  zero 
frequency  are  enhanced  250%  in  magnitude.  The  resulting  image  is  shown  on 
the  right  of  Figure  22,  which  has  sharper  details  (e.g.  the  lines  of  the  hat).  Hence, 
increasing  the  power  of  the  higher  frequency  components  enhances  the  edges 
and  sharpens  the  details  in  the  image,  very  much  similar  to  a  high-boost  filter. 
The  bottom  pair  of  images  in  Figure  22  illustrates  the  effect  of  increasing  the  zero 
frequency  component  by  20%  (the  brightness  of  the  image  is  increased). 


Figure  22:  Effect  of  adjusting  spatial  frequency  powers  on  the 
image.  The  top  pair  of  images  illustrates  an  increase  in  the  power  of 
the  higher  frequency  components,  while  the  bottom  pair  represents  an 
increased  power  in  the  zero  frequency  component. 
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B.  IMAGE  QUALITY  ASSESSMENT 


The  aim  of  image  processing  is  naturally  concerned  with  producing  better 
images.  But  the  key  question  is  how  do  we  quantify  or  measure  the  term  “better” 
in  image  quality  assessment.  There  is  no  absolute  measuring  scale  like  the 
kilogram  in  weight  or  the  meter  in  distance.  The  fact  remains  that  the  image  is 
ultimately  perceived  by  a  pair  of  human  eyes  and  interpreted  by  the  human  brain 
for  whatever  purpose  the  image  is  intended  for.  Hence,  the  assessment  of  image 
quality  is  always  subjective.  There  have  been  attempts  to  introduce  an  objective 
assessment  methodology  of  image  quality,  such  as  mean-square  error, 
probability  of  detection  and  peak  signal-to-noise  ratio  [Barret,  1990].  But  the 
basic  difficulty  is  that  images  can  be  used  for  a  variety  of  functions  or  purposes 
(e.g.  classification,  detection  and  measurement).  A  “good”  image  for  one  purpose 
may  not  be  suitable  for  another.  Furthermore,  the  performance  of  the  human 
visual  system  (including  the  human  brain)  is  not  consistent  even  for  the  same 
image,  let  alone  among  individuals.  Experience,  eye-sight,  training,  age,  physical 
conditions  and  fatigue  will  all  affect  the  final  interpretation  of  the  image. 

An  image  is  always  produced  for  a  specific  purpose  or  task,  and  the  only 

meaningful  measure  of  its  quality  is  how  well  it  fulfills  that  purpose.  An  objective 

approach  to  assessment  of  image  quality  must  therefore  start  with  a  specification 

of  the  task  and  then  determine  quantitatively  how  well  the  task  is  performed  or 

achieved  [Barrett,  1990].  For  example,  in  assessing  the  image  quality  for  image 

compression,  the  mean-square  error  is  a  relevant  and  objective  measure  of  the 

amount  of  distortion  in  the  compressed  image,  as  the  smaller  the  error,  the  better 

the  image.  In  the  case  of  night  vision  images,  their  main  purpose  will  be  for 

detection  of  objects  and  providing  information  about  the  surrounding,  when  the 

human  eyes  are  not  sensitive  enough  under  the  low-illumination  conditions.  A 

quantitative  measure  for  such  a  purpose  would  be  the  probability  of  detection  or 

the  time  to  detection.  However,  all  the  II  and  Tl  images  used  in  this  thesis  are 

samples  provided  by  the  Naval  Research  Laboratory,  as  suitable  imagers  were 

not  available  at  the  time  of  the  study.  Some  of  the  images  contain  identifiable 

objects,  such  as  ships  and  fence,  while  others  are  just  general  outdoor  scenes  of 
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foliage.  There  is  unfortunately  no  “hidden”  object  implanted  in  the  scenes  to 
measure  quantitatively  the  quality  of  the  image  with  respect  to  its  purpose  for 
detection. 

Another  objective  measure  of  night  vision  images  could  be  the  number  of 
edges  or  the  intensity  of  the  edges  in  the  image.  With  more  enhanced  edges, 
more  details  and  more  information  can  be  perceived  from  the  image.  As 
discussed  in  the  previous  section,  edges  in  the  image  correspond  to  high  spatial 
frequencies.  Hence,  for  the  same  image,  if  there  is  more  power  in  the  higher 
spatial  frequencies,  the  edges  will  be  enhanced  and  hence,  more  details  will  be 
detectable.  This  is  similar  in  principle  to  the  highpass  filter  in  the  frequency 
domain  as  described  in  Chapter  II.  In  this  respect,  the  quality  of  the  image  can 
therefore  be  judged  to  be  better,  as  the  enhanced  edges  would  improve  the 
information  content  of  the  image,  and  the  increased  power  in  the  high  spatial 
frequencies  can  be  measured  objectively. 

C.  ANALYZSIS  OF  ENHANCEMENT  RESULTS 

A  CLAHE-processed  night  vision  image  is  compared  to  its  original 
unprocessed  version  in  Figure  23.  The  CLAHE  processed  image  appears  to 
have  “better  clarity”  as  image  edges  and  details  have  been  enhanced  by  the 
CLAHE  process.  The  profile  of  the  foliage  and  the  river  bank  are  “easier”  to 
identify.  The  single  small  tree  in  the  center  of  the  image  is  a  good  example  of 
enhancement  produced  by  CLAHE.  Therefore,  this  edge  enhancement  would 
theoretically  be  accompanied  by  increased  higher  spatial  frequency  components 
in  the  frequency  domain  of  the  image.  Our  aim  is  to  compare  the  frequency 
spectra  of  the  original  and  the  processed  image  for  increased  higher  frequencies 
and  to  use  this  difference  as  an  objective  basis  forjudging  improvement  in  image 
quality,  instead  of  relying  solely  on  subjective  visual  assessment. 
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Figure  23;  Unprocessed  (top)  and  CLAHE  processed  night  vision 
images  (bottom)  for  comparing  the  improvement  in  image  contrast  and 
details  enhancement  by  CLAHE. 
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1.  Spatial  Frequency  Spectrum 

The  image  is  first  converted  from  the  spatial  domain  to  the  frequency 
domain  by  using  the  2-dimensional  discrete  Fast  Fourier  Transform  (FFT)  in 
MATLAB.  The  image  is  “padded”  (to  1024x1024)  during  the  FFT  process,  i.e. 
adding  zeros  to  the  beginning  and/or  end  of  the  time-domain  sequence.  This 
addition  increases  the  frequency  resolution  of  the  FFT  and  does  not  affect  the 
frequency  spectrum  of  the  image.  As  the  image  sizes  are  480x640,  padding  the 
image  to  even  dimensions  of  power  2  (2^°  =  1024)  also  reduces  the  FFT 
computation  time.  The  Fourier  transform  is  also  shifted  to  center  the  zero 
frequency  with  respect  to  the  image  center.  The  frequency  power  spectrum  is 
then  plotted  out  using  the  “mesh”  command  in  MATLAB. 

Figure  24  plots  the  frequency  responses  of  the  unprocessed  and  the 
corresponding  CLAHE-processed  image  shown  in  Figure  23.  Clearly,  there  is  an 
increased  amount  of  higher  frequency  components,  as  shown  by  the  higher 
spikes  and  color-coded  profiles  contained  in  the  pictures,  i.e.  there  is  more  power 
in  the  higher  spatial  frequencies.  This  observation  supports  the  fact  that  the 
edges  have  been  enhanced.  Notice  that  the  zero  frequency  is  centered  at  the 
location  (512,512)  as  a  result  of  the  padding  to  1024x1024. 

2.  Spectrum  Power  Distribution 

Next,  the  cumulative  power  distribution  with  respect  to  the  distance  from 
the  center  zero  frequency  (in  terms  of  number  of  pixel  count)  is  plotted  to  further 
examine  the  frequency  power  distribution.  This  computation  is  accomplished  by 
superimposing  a  square  window  over  the  frequency  spectrum  and  summing  the 
power  contained  within  it.  The  center  of  the  square  will  overlie  the  zero  frequency 
center  and  the  distance  will  be  equivalent  to  half  the  length  of  the  square  window. 
A  contour  plot  of  the  frequency  spectrum  was  created  with  MATLAB  to  illustrate 
the  expanding  window  for  computing  the  total  amount  of  power,  as  shown  in 
Figure  25.  The  contour  plots  also  provide  a  different  viewing  aspect  for 
comparing  the  frequency  spectra  of  the  processed  and  unprocessed  images. 
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Frequency  Spectrum  of  Original  Image 
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Figure  24:  Frequency  spectrum  plot  of  the  unprocessed  image  and 
the  CLAFIE  processed  image,  showing  an  increase  in  the  power  of 
higher  frequency  components.  The  maximum  peak  value  is  clipped  at 
5x10^  to  focus  on  the  power  distribution  beyond  the  zero  frequency. 
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Frequency  spectrum  plot  of  original  image 
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Frequency  spectrum  plot  of  CLAFIE  image 
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Figure  25:  Contour  plots  of  the  unprocessed  image  and  the  CLAHE 
processed  image.  The  summation  process  to  compute  the  power 
distribution  is  as  illustrated  on  the  top  image. 
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,( 10®  Spectrum  power  distribution 


Figure  26:  Cumulative  spectrum  power  distribution  plots  for  six  pairs 
of  images,  showing  an  overall  increased  in  total  spectrum  power  and  a 
higher  percentage  of  power  in  the  higher  frequencies. 

The  cumulative  spectrum  power  distributions  of  the  original  and  processed 
images  are  plotted  in  Figure  25.  A  total  of  six  image  pairs  were  used  to  give  an 
indicative  trend  of  the  distribution  profile.  Figure  25  shows  that  the  total  spectrum 
power  has  been  increased  by  the  CLAHE  process,  which  can  be  translated  here 
to  increased  brightness  and  contrast  in  the  image.  The  rate  of  increase  in  the 
cumulative  power  in  the  second  half  of  the  curves,  i.e.  the  higher  frequencies,  is 
also  steeper  for  the  CLAFIE-processed  images  (the  green  dotted  lines)  than  that 
of  the  original  image,  as  illustrated  by  the  gradient  triangles  in  red.  This 
difference  implies  that  there  is  a  higher  percentage  of  power  contained  in  the 
higher  frequencies  and  indicates  edge  enhancement  in  the  processed  images. 
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){ lo'^  Normalized  Spectrum  Power  Distribution 


Figure  27:  Spectrum  power  distribution  plot.  The  percentage  of 
power  contained  in  the  higher  frequencies  is  higher  for  the  CLAHE- 
processed  image  as  shown  by  the  green  profile. 


Figure  27  shows  that  a  higher  percentage  of  power  is  contained  in  the 
higher  frequencies  (from  the  100**^  pixel  onwards  for  this  image)  in  the  CLAFIE 
processed  image  than  in  the  original  unprocessed  image.  We  also  note  that  the 
percentage  of  power  in  the  lower  frequencies  is  lower  for  the  processed  image, 
which  is  not  significant  as  the  vital  information,  i.e.  the  edge  content,  is  contained 
in  the  higher  boosted  frequencies. 

In  summary,  the  results  presented  in  Figure  26  and  27  validate  the 
observation  that  the  CLAFIE  process  has  enhanced  the  image  edges  and  details, 
as  evident  from  the  boosted  higher  spatial  frequency  components.  The  CLAFIE- 
enhanced  images  are  therefore  judged  to  be  improved. 
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3.  Histogram 

The  histogram  of  the  CLAHE-processed  image  is  compared  with  its 
unprocessed  version  in  Figure  27.  The  CLAHE  processed  image  has  a  more 
evenly-distributed  and  wider  spread  of  the  gray  levels,  which  translates  to  an 
image  with  better  contrast  as  seen  in  the  processed  image  in  Figure  22.  Since 
the  amplitude  of  spatial  frequency  is  dependent  on  the  degree  of  contrast 
change,  a  larger  contrast  range  in  the  histogram  is  therefore  linked  to  increased 
spatial  frequency  components. 


Histogram  Plot  of  Original  Image 
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Figure  28:  Comparison  of  the  histograms  of  the  unprocessed  and 
the  CLAHE  processed  image.  The  images  are  from  Figure  22. 
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D.  SUBJECTIVE  ASSESSMENT 

The  eventual  user  of  an  image  is  still  the  human  being.  Theoretical  figures 
of  merit  and  engineering  computations  may  be  inadequate  in  predicting  the 
human  response.  Hence,  any  image  quality  assessment  should  still  be  validated 
by  human  subjects  for  acceptance. 

1.  Test  Outline 

A  subjective  test  was  conducted  to  evaluate  the  image  enhancement  by 
CLAHE.  Fifteen  students  from  the  Naval  Postgraduate  school,  aged  28  to  38 
years  old,  were  approached  for  the  test.  Fifteen  is  the  recommended  minimum 
number  of  test  subjects  by  the  International  Telecommunication  Union  [ITU-R 
BT.500-11,  2002].  All  subjects  were  voluntary  and  signed  informed  consents. 
Five  of  the  subjects  have  no  prior  experience  with  night  vision  images  or  night 
vision  devices,  while  the  rest  have  experience  with  either  the  night  vision  goggle 
or  the  Thermal  Imager. 

20  image  pairs  consisting  of  one  CLAHE-processed  and  one  unprocessed 
image  of  the  same  scene,  were  presented  to  the  subjects  on  a  Toshiba  TECRA 
9100®  laptop  with  32-bit  color  and  1024x768  resolution  setting.  Brightness 
setting  of  the  laptop  LCD  was  at  50%  and  the  test  was  conducted  in  a  dimly- 
lighted  room.  Subjects  were  shown  two  consecutive  sequences  of  the  same 
image  pair  and  asked  to  indicate  their  preference  as  to  which  one  of  the  two 
images  conveyed  the  most  information  or  details  about  the  scene.  “Most 
information”  can  be  interpreted  as  what  allows  the  subject  to  see  more  objects  (if 
any)  or  provides  a  better  situation  awareness  about  the  scene.  A  choice  of 
“neutral”  can  be  entered  when  the  subject  finds  that  both  images  are  comparable 
or  there  is  no  significant  difference  between  the  two.  The  display  timing  of  the 
image  sequence  was  set  as:  three  seconds  (image  1),  one  second  (blank 
screen),  three  seconds  (image  2),  followed  by  a  two  seconds  pause  before  the 
same  sequence  was  repeated  for  a  second  time.  Each  test  lasted  approximately 
15  minutes. 
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The  order  of  the  processed  and  unprocessed  image  in  the  display 
sequence  was  randomized.  Of  the  20  image  pairs,  5  were  thermal  images  while 
the  rest  were  NVD  or  II  images.  The  thermal  image  pairs  were  interspersed 
among  the  II  image  pairs  randomly.  Due  to  the  inherent  high  contrast  present  in 
these  thermal  images,  it  is  expected  that  the  enhancement  by  the  CLAHE 
process  would  be  insignificant  and  may  even  degrade  the  image  quality. 
Therefore,  the  thermal  images  were  inserted  to  break  any  monotony  of  choice 
that  may  arise  in  the  experiment. 

2.  Results 

Survey  results  are  summarized  in  Table  2.  75%  of  the  subjects  found  the 
CLAHE-processed  night  vision  images  to  be  more  informative  and  a  more 
meaningful  representation  of  the  scene,  as  compared  to  the  original  associated 
unprocessed  images.  This  finding  supports  the  proposition  that  the  content  of  the 
image  has  been  enhanced  by  the  CLAHE  process. 

The  majority  of  the  subjects  did  not  find  the  CLAHE-processed  thermal 
images  to  be  better  in  providing  information.  About  only  35%  of  the  subjects 
found  the  processed  thermal  images  to  be  more  effective  in  providing 
information.  This  result  could  be  due  to  the  fact  that  the  thermal  images  provided 
by  NRL  already  have  very  good  original  contrast  and  as  a  result,  the  contrast 
enhancement  by  CLAHE  is  not  significant.  In  some  cases,  the  subjects 
commented  that  the  image  was  “over-contrasted”,  making  the  image  “unnatural” 
and  details  difficult  to  identify.  An  example  is  shown  in  Figure  28.  The  image  pair 
in  Figure  28  is  actually  image  pair  number  10  in  the  subjective  test,  which 
received  the  lowest  score. 

The  CLAHE  process  enhancement  is  effective  on  the  low-contrast  night 
vision  images  as  validated  in  the  subjective  testing.  Thermal  images  generally 
have  better  contrast  due  to  suppression  of  the  background  by  AC  coupling  during 
the  filtering  process.  But  there  would  still  be  cases  of  low  contrast  thermal 
images,  such  as  during  dusk  and  dawn  when  the  background  temperature  draws 

near  the  object  temperature  due  to  difference  in  thermal  conductivity  of  object 
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and  background.  Therefore,  the  CLAHE  process  is  still  applicable  to  thermal 
imagery. 


Table  2.  Subjective  Test  Results 
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Figure  29:  Unprocessed  and  processed  thermal  image  pair, 
illustrating  the  minimal  improvement  by  the  CLAHE  process. 

3.  Observations  and  Comments 
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a.  No  Objectivity  in  images 

Most  of  the  images  obtained  from  the  Naval  Research  Laboratory 
are  outdoor  scenes  with  no  particular  object  for  detection.  The  general  feedback 
from  the  subjects  is  that  it  is  difficult  to  judge  the  information  content  of  the  image 
without  a  specific  object  to  look  out  for,  i.e.  some  specific  detail  that  could  be 
seen  in  only  the  enhanced  image  and  not  the  original  image.  Images  which  have 
such  characteristics  would  aid  in  making  the  test  more  objective.  A  few  of  the 
subjects  entered  a  “neutral”  choice,  basically  because  they  could  see  the  same 
amount  of  details  in  both  original  and  enhanced  images  as  both  sets  of  images 
contained  the  same  information,  even  though  the  processed  images  appeared 
clearer.  This  explains  the  relative  lower  score  for  image  pairs  6,  7,  8  and  14  (the 
images  are  available  in  Appendix  B). 

Hence,  for  future  subjective  testing,  image  pairs  should  be  created 
(when  the  actual  hardware  is  available)  with  one  or  more  objects  for  detection. 
The  objects  could  be  obscured  by  low-light  or  camouflage  to  reduce  their 
contrast  and  visibility  in  the  original  night  vision  images.  These  objects  would  be 
easier  to  see  and  detect  after  the  CLAHE  enhancement.  A  good  example  is 
the  image  pair  from  Figure  8  (original)  and  Figure  20  (CLAHE-processed).  More 
ships  can  actually  be  seen  with  the  enhancement,  as  agreed  by  86.7%  of  the 
subjects. 

b.  Scanning  versus  Staring 

Some  of  the  subjects  found  the  display  time  for  the  images  to  be 
too  short  for  a  proper  assessment,  which  relates  directly  to  the  issue  of  scanning 
or  staring  assessment.  Scanning  is  more  concerned  with  wide-area  surveillance 
where  the  assessment  time  is  short  and  the  images  are  displayed  real-time;  for 
staring,  the  image  display  is  static.  The  commonality  linking  the  two  is  the  time  to 
detection.  Subjects  would  be  likely  to  take  less  time  to  detect  an  object  when  the 
image  has  better  contrast.  Hence,  the  time  to  detection  could  be  another 
objective  measure  of  the  image  quality.  However  this  measure  can  only  be 
explored  when  there  is  object  implanted  in  the  image,  as  discussed  in  the 
previous  section. 


55 


c.  Experience  Factor 

Five  of  the  fifteen  subjects  did  not  have  any  prior  experience  of 
viewing  night  vision  images  or  devices.  Separating  the  two  groups  of  subjects, 
the  percentage  for  the  CLAHE-processed  image  went  up  to  78%  for  those 
subjects  with  night  vision  experience  as  shown  in  Table  3,  and  the  percentage  is 
only  69%  for  subjects  without  any  prior  experience  as  per  Table  4.  The  subjects 
from  the  group  “without  experience”  indicated  that  they  found  enhanced  noise 
and  “graininess”  in  the  CLAHE-processed  image  to  be  distracting,  and  preferred 
the  original  unprocessed  image.  The  noise  in  question  is  actually  inherited  from 
the  original  image  and  hardware,  something  that  experienced  subjects  have 
already  accepted  as  a  general  characteristic  of  night  vision  images.  Therefore, 
experience  turns  out  to  be  a  factor  in  the  test  results  and  should  not  be 
overlooked,  as  this  group  represents  the  new-users  of  night  vision  devices.  It  is 
also  noted  that  there  were  more  “neutral”  choices  from  the  experienced  subjects, 
which  could  be  explained  by  the  lack  of  objectivity  in  the  test  images  as 
discussed  earlier. 

We  recommend  that  the  number  of  subjects  be  increased  and 
include  an  equal  number  of  experienced  and  inexperienced  viewers  for  future 
studies.  This  would  allow  a  more  accurate  analysis  of  the  acceptance  of  the 
CLAHE  enhancement  and  the  influence  of  experience.  The  larger  subject  base 
would  also  better  represent  the  population  of  users  of  night  vision  and  thermal 
devices. 

d.  Original  Image  Quality 

Image  pair  4  received  a  relatively  lower  score  for  an  II  image. 
Examining  the  image  pair  reveals  that  the  original  image  has  reasonably  good 
contrast  due  to  a  light  source  in  the  sky.  Hence,  the  enhancement  by  CLAHE 
was  not  significant,  which  is  similar  to  the  thermal  image  pairs  where  the  most 
common  response  was  a  preference  for  the  original  image.  Therefore,  the 
CLAHE  enhancement  may  not  be  always  necessary. 
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Table  3.  Subjective  Test  Results  (with  night  vision  experience) 


Average  preference  for  CLAHE-processed  II  image  _ 78.0 

Average  preference  for  CLAHE-processed  Tl  image  30.0 
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Table  4.  Subjective  Test  Results  (without  prior  experience) 


Image  Preference  (%  of  subjects) 

Image  Pair 

Type 

Processed 

Unprocessed 

Neutral 

1 

II 

60.0 

20.0 

20.0 

2 

II 

40.0 

60.0 

- 

3 

II 

100.0 

- 

- 

4 

II 

60.0 

40.0 

- 

5 

Tl 

40.0 

60.0 

- 

6 

II 

60.0 

40.0 

- 

7 

II 

60.0 

40.0 

- 

8 

II 

80.0 

20.0 

- 

9 

II 

80.0 

20.0 

- 

10 

Tl 

20.0 

60.0 

20.0 

11 

II 

60.0 

40.0 

- 

12 

Tl 

60.0 

- 

13 

II 

80.0 

20.0 

- 

14 

II 

60.0 

40.0 

- 

15 

Tl 

40.0 

40.0 

20.0 

16 

II 

60.0 

40.0 

- 

17 

II 

80.0 

20.0 

- 

18 

Tl 

60.0 

40.0 

- 

19 

II 

80.0 

20.0 

- 

20 

II 

80.0 

20.0 

- 

Average  preference  for  CLAHE-processed  II  image  69.3 


Average  preference  for  CLAHE-processed  Tl  image  44.0 
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IV.  CONCLUSIONS  AND  RECOMMENDATIONS 


A.  SUMMARY 

The  CLAHE  algorithm  is  a  digital  contrast  enhancement  technique  that 
emphasizes  local  details  in  the  image  while  limiting  noise  amplification.  This 
process  is  achieved  with  local  histogram  equalization  and  clipping,  followed  by 
bilinear  interpolation. 

CLAHE  contrast  enhancement  has  been  found  to  be  visually  significant, 
and  object  detection  is  improved  with  the  higher  contrast  in  the  images. 
Examining  the  frequency  response  of  the  enhanced  image  reveals  increases  in 
the  higher  spatial  frequencies.  As  higher  spatial  frequencies  correspond  to  edges 
in  the  image,  the  increase  in  power  represents  an  enhancement  of  the  edges  and 
hence,  an  increase  in  visible  image  details.  We  also  conducted  a  subjective 
testing  where  the  majority  of  the  human  subjects  indicated  that  the  CLAHE- 
enhanced  images  were  more  informative  than  the  original  images. 

Results  indicated  that  the  CLAHE  process  is  effective  in  enhancing  low- 
contrast  images.  However,  the  improvement  is  limited  for  images  with  initially 
good  contrast,  such  as  the  thermal  images  in  this  study.  Nevertheless,  Tl  can  still 
suffer  from  low-contrast  during  the  day,  especially  during  dusk  and  dawn. 
Therefore,  the  CLAHE  enhancement  scheme  is  still  applicable  to  both  night 
vision  devices  (Image  Intensifiers)  and  Thermal  Imagers.  This  enhancement 
would  be  attractive  for  Image  Intensifiers  since  they  are  cheaper  and  more 
compact,  and  their  main  handicap  is  their  low-contrast  imagery. 

The  CLAHE  process  can  be  implemented  in  the  form  of  a  computer 
algorithm  or  a  hardware  electronic  chip  in  the  interface  between  the  sensor  and 
display.  No  modification  is  required  on  the  sensor  itself.  The  enhancement  can 
also  be  real-time,  as  the  CLAHE  processing  is  not  demanding.  There  is  still  a 
need  for  an  on/off  switch  or  option  for  the  process  as  not  all  subjects  found  the 
enhancement  beneficial  at  all  times. 
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B.  RECOMMENDATION  FOR  FURTHER  RESEARCH 


1.  Subjective  Test  with  Object  Detection 

A  new  set  of  matching  night  vision  and  thermal  images  containing  specific 
objects  should  be  created.  The  objects  should  be  on  the  threshold  of  visibility  in 
the  unprocessed  image  and  they  should  become  detectable  after  the  CLAHE 
enhancement.  These  image  pairs  can  then  be  used  in  a  larger  or  more  extensive 
subjective  test  to  determine  the  time  to  detection  for  these  objects.  Such  test 
would  help  quantify  the  CLAHE  improvement  more  objectively,  and  potentially 
justify  its  implementation  cost. 

2.  Image  Fusion 

CLAHE-enhanced  night  vision  images  can  be  fused  with  their  thermal 
counterparts  (with  or  without  enhancement)  to  assess  any  further  improvement  in 
image  quality  using  the  same  frequency  evaluation  and  subjective  testing.  One 
potential  fusion  algorithm  to  consider  could  be  the  nonlinear  method  proposed  by 
Scrofani  et.  al.  earlier  (1997). 
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APPENDIX  A:  MATLAB  ALGORITHMS 


This  Appendix  contains  the  following  MATLAB  source  files: 

1.  Histogram  equalization  (Test8_hist_equal.m). 

2.  Frequency  spectrum  plot  (Test13_power.m). 
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%  Test8_hist_equal.m 
%  Histogram  equalization 
%  =========================== 

%  The  input  to  the  file  has  to  be  made  manually  in  the  m-file  and  run. 
%  The  output  will  consist  of  four  histogram  plots,  the  original  image 
%  and  the  processed  image. 

%  =========================== 


Aii  =  imread('21-l.tif);  %  Input  test  image  21-l.tif 


Aorg  =  Aii; 

graylvl  =  256;  %  note  the  need  to  specify  gray  levels,  typically  it  is  256 
Ivl  =  graylvl  - 1; 

disp('Generating  histogram . '); 

%  =====  Generate  histogram  count  ===== 
for  k  =  1:graylvl 

n_count(k)  =  length(find(Aii  ==  k-1)); 
end 


r=  [0:1:lvl]; 
r_norm  =  r./lvl; 
total  =  sum(n_count); 
pdf  =  n_count./total; 


%  graylevels  from  0  to  255 
%  normalized 
%  total  pixels  count 
%  generate  probability  distribution 


s_cdf  =  pdf;  %  generate  cumulative  density  function 

for  a  =  1  :length(r)-1 

s_cdf(a+1)  =  s_cdf(a+1)+  s_cdf(a); 
end 


sjnt  =  s_cdf.*lvl;  %  rescale  back  to  graylevel  values 

sjvl  =  uint8(s_int+1 .5);  %  convert  to  integer  by  removing  decimals 

s_new  =  zeros(size(n_count));  %  +1  to  account  for  zero  graylevel  at  1st 
column 


disp('Equalising . '); 

%  =====  Combine  count  for  same  gray  levels  after  transformation 
for  count  =  1:1:lvl+1 

s_new(s_lvl(count))  =  s_new(s_lvl(count))+  n_count(count); 
end 

s_new  =  s_new./total;  %  normalized  new  values 

%  =====  Remap  graylevels  in  image  ===== 
for  m  =  1 :480 
for  n  =  1 :640 

Aii(m,n)  =  sJvl(double(Aii(m,n))+1); 
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end 

end 

disp('Transforming  image . '); 

%  =====  Counter-check  graylevel  transformation  for  equalization  ===== 
for  k  =  1:graylvl 

n_check(k)  =  length(find(Aii  ==  k-1)); 
end 

disp('....done.'); 

%  =====  Plot  histograms  ===== 

Figure(1) 

subplot(2,2,1  ),bar(r,n_count),title('Original  histogram'), axis  tight; 
subplot(2,2,2),bar(r_norm,s_cdf),title('Cdf),axis  tight; 
subplot(2,2,3),bar(r_norm,s_new),title('Equalized  histogram'), axis  tight; 
subplot(2,2,4),bar(r,n_check),  title('Equalized  histogram  2'), axis  tight; 

%  the  3'^'^  histogram  is  normalized  and  serve  as  a  counter-check  for  the  4**^ 
histogram 

Figure(2) 

imshow(uint8(Aorg),  256); 
title('Original  image') 

Figure(3) 

imshow(uint8(Aii),256); 
title('Resultant  image') 

%  =====  end  ===== 
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%  Test13_power.m 
%  Plot  the  spectrum  power  distribution 
%  =========================== 

%  The  input  to  the  file  has  to  be  made  manually  in  the  m-file  and  run. 

%  Input  A  is  the  original  image  while  input  B  is  the  CLAHE  processed  image. 
%  The  first  figure  output  will  be  the  cumulative  spectrum  power  plot. 

%  The  second  figure  output  is  the  spectrum  power  distribution. 

%  =========================== 


clear; 

Aii  =  imread('25-l.tif); 

Bii  =  imread('25-lah.tif); 

%  input  original  image 
%  input  CLAHE  image 

Afft  =  fft2(Aii, 1024, 1024); 

Afft2  =  fftshift(Afft); 

A2  =  abs(Afft2); 

%  fast  fourier  transform  with  padding 
%  center  zero  frequency 
%  take  magnitude  of  complex 

Bfft  =  fft2(Bii, 1024, 1024); 

Bfft2  =  fftshift(Bfft); 

B2  =  abs(Bfft2); 

%  fast  fourier  transform  with  padding 
%  center  zero  frequency 
%  take  magnitude  of  complex 

%  find  center  of  spectrum 
[n1  x]  =  max(max(A2,[],1)); 

[ml  y]  =  max(max(A2,[],2)); 

A_total  =  sum(sum(A2)); 

[m  n]  =  size(A2); 
dim_max  =  m  -  y; 

A_array(1)  =  A2(x,y); 
A_arrayc(1)  =  A2(x,y); 

%  find  max  dimensions  of  image 

%  expanding  square  and  sum 
for  dim  =  1:dim_max 
A_arrayc(dim+1)  =  0; 
for  a  =  x-dimix+dim 
for  b  =  y-dim:y+dim 

A_arrayc(dim+1)  =  A_arrayc(dim+1)+A2(b,a); 
A_array(dim+1)  =  A_arrayc(dim+1)-  A_arrayc(dim); 
end 
end 
end 

%  find  center  of  spectrum  for  CLAHE  image 
[n1b  xb]  =  max(max(B2,[],1)); 

[m1b  yb]  =  max(max(B2,[],2)); 


64 


B_total  =  sum(sum(B2)); 

[mb  nb]  =  size(B2); 
dim_maxb  =  mb  -  yb; 

B_array(1)  =  B2(xb,yb); 

B_arrayc(1)  =  B2(xb,yb); 

for  dimb  =  1  :dim_maxb 
B_arrayc(dimb+1)  =  0; 
for  a  =  xb-dimb:xb+dimb 
for  b  =  yb-dimb:yb+dimb 

B_arrayc(dimb+1)  =  B_arrayc(dimb+1)+B2(b,a); 
B_array(dimb+1)  =  B_arrayc(dimb+1)-  B_arrayc(dimb); 
end 
end 
end 

%  ===  Plot  cumulative  spectrum  power  distribution  === 
figure; 

plot(0:51 1  ,A_arrayc./A_total,0:51 1  ,B_arrayc./B_total) 


%  ===  Plot  power  distribution  === 
figure; 

plot(0:51 1  ,A_array./A_total,0:51 1  ,B_array./B_total) 

%  May  have  to  zoom  in  the  y  aixs  for  a  better  view  of  the  distribution 

%  ====  end  ===== 
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APPENDIX  B:  CLAHE  ENHANCED  IMAGES 


The  following  images  are  results  obtained  from  using  the  Contrast  Limited 
Adaptive  Histogram  Equalization  (CLAHE)  enhancement  algorithm.  The  images 
on  the  left  column  are  the  original  unprocessed  night  vision  images,  while  the 
images  on  the  right  are  the  CLAHE  processed  images.  These  image  pairs  are 
used  in  the  subjective  testing  to  assess  the  improvement  by  the  CLAHE  method. 
The  numbering  of  the  image  pair  is  the  same  as  that  used  in  the  subjective  test. 
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Image  pair  6 


Image  pair  8 
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Image  pair  10 


Image  pair  11 
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Oriainal 
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Oriqinal 
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